Search CORE

862 research outputs found

Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification?

Author: Cohen Trevor
Li Changye
Pakhomov Serguei
Xu Weizhe
Publication venue
Publication date: 10/01/2024
Field of study

\textbf{Objectives}: We aimed to investigate how errors from automatic speech recognition (ASR) systems affect dementia classification accuracy, specifically in the ``Cookie Theft'' picture description task. We aimed to assess whether imperfect ASR-generated transcripts could provide valuable information for distinguishing between language samples from cognitively healthy individuals and those with Alzheimer's disease (AD). \textbf{Methods}: We conducted experiments using various ASR models, refining their transcripts with post-editing techniques. Both these imperfect ASR transcripts and manually transcribed ones were used as inputs for the downstream dementia classification. We conducted comprehensive error analysis to compare model performance and assess ASR-generated transcript effectiveness in dementia classification. \textbf{Results}: Imperfect ASR-generated transcripts surprisingly outperformed manual transcription for distinguishing between individuals with AD and those without in the ``Cookie Theft'' task. These ASR-based models surpassed the previous state-of-the-art approach, indicating that ASR errors may contain valuable cues related to dementia. The synergy between ASR and classification models improved overall accuracy in dementia classification. \textbf{Conclusion}: Imperfect ASR transcripts effectively capture linguistic anomalies linked to dementia, improving accuracy in classification tasks. This synergy between ASR and classification models underscores ASR's potential as a valuable tool in assessing cognitive impairment and related clinical applications.Comment: To appear on Journal of Biomedical Informatic

arXiv.org e-Print Archive

Enhancing clinical concept extraction with distributional semantics

Author: Cohen Trevor
Gonzalez Graciela
Jonnalagadda Siddhartha
Wu Stephen
Publication venue: Elsevier Inc.
Publication date: 01/02/2012
Field of study

AbstractExtracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text.The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type “clinical trials” to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task.The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged F-score for exact match increased from 80.3% to 82.3% and the micro-averaged F-score based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data

Elsevier - Publisher Connector

PubMed Central

TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments

Author: Cohen Trevor
Li Changye
Michalowski Martin
Pakhomov Serguei
Publication venue
Publication date: 14/02/2023
Field of study

The evidence is growing that machine and deep learning methods can learn the subtle differences between the language produced by people with various forms of cognitive impairment such as dementia and cognitively healthy individuals. Valuable public data repositories such as TalkBank have made it possible for researchers in the computational community to join forces and learn from each other to make significant advances in this area. However, due to variability in approaches and data selection strategies used by various researchers, results obtained by different groups have been difficult to compare directly. In this paper, we present TRESTLE (\textbf{T}oolkit for \textbf{R}eproducible \textbf{E}xecution of \textbf{S}peech \textbf{T}ext and \textbf{L}anguage \textbf{E}xperiments), an open source platform that focuses on two datasets from the TalkBank repository with dementia detection as an illustrative domain. Successfully deployed in the hackallenge (Hackathon/Challenge) of the International Workshop on Health Intelligence at AAAI 2022, TRESTLE provides a precise digital blueprint of the data pre-processing and selection strategies that can be reused via TRESTLE by other researchers seeking comparable results with their peers and current state-of-the-art (SOTA) approaches.Comment: Accepted at AMIA Informatics Summi

arXiv.org e-Print Archive

CELLS: A Parallel Corpus for Biomedical Lay Language Generation

Author: Cohen Trevor
Guo Yue
Leroy Gondy
Qiu Wei
Wang Sheng
Publication venue
Publication date: 07/11/2022
Field of study

Recent lay language generation systems have used Transformer models trained on a parallel corpus to increase health information accessibility. However, the applicability of these models is constrained by the limited size and topical breadth of available corpora. We introduce CELLS, the largest (63k pairs) and broadest-ranging (12 journals) parallel corpus for lay language generation. The abstract and the corresponding lay language summary are written by domain experts, assuring the quality of our dataset. Furthermore, qualitative evaluation of expert-authored plain language summaries has revealed background explanation as a key strategy to increase accessibility. Such explanation is challenging for neural models to generate because it goes beyond simplification by adding content absent from the source. We derive two specialized paired corpora from CELLS to address key challenges in lay language generation: generating background explanations and simplifying the original abstract. We adopt retrieval-augmented models as an intuitive fit for the task of background explanation generation, and show improvements in summary quality and simplicity while maintaining factual correctness. Taken together, this work presents the first comprehensive study of background explanation for lay language generation, paving the path for disseminating scientific knowledge to a broader audience. CELLS is publicly available at: https://github.com/LinguisticAnomalies/pls_retrieval

arXiv.org e-Print Archive

Embedding Probabilities in Predication Space with Hermitian Holographic Reduced Representations

Author: Dominic Widdows
Trevor Cohen
Publication venue
Publication date: 11/04/2020
Field of study

Abstract. Predication-based Semantic Indexing (PSI) is an approach to generating high-dimensional vector representations of concept-relation-concept triplets. In this paper, we develop a variant of PSI that accommodates estimation of the probability of encountering a particular predication (such as fluoxetine TREATS major depressive disorder) in a collection of predications concerning a concept of interest (such as major depressive disorder). PSI leverages reversible vector transformations provided by representational approaches known as Vector Symbolic Architectures (VSA). To embed probabilities we develop a novel VSA variant, Hermitian Holographic Reduced Representations, with improvements in predictive modeling experiments. The probabilistic interpretation this facilitates reveals previously unrecognized connections between PSI and quantum theory -perhaps most notably that PSI's estimation of relatedness across multiple reasoning pathways corresponds to the estimation of the probability of traversing indistinguishable pathways in accordance with the rules of quantum probability

CiteSeerX

EpiphaNet: An Interactive Tool to Support Biomedical Discoveries

Author: Cohen Trevor
Mukund Kavitha
Rindflesch Thomas
Schvaneveldt Roger W
Whitfield G Kerr
Publication venue: University of Illinois at Chicago Library
Publication date: 21/09/2010
Field of study

Background. EpiphaNet (http://epiphanet.uth.tmc.edu) is an interactive knowledge discovery system, which enables researchers to explore visually sets of relations extracted from MEDLINE using a combination of language processing techniques. In this paper, we discuss the theoretical and methodological foundations of the system, and evaluate the utility of the models that underlie it for literature‐based discovery. In addition, we present a summary of results drawn from a qualitative analysis of over six hours of interaction with the system by basic medical scientists. Results: The system is able to simulate open and closed discovery, and is shown to generate associations that are both surprising and interesting within the area of expertise of the researchers concerned. Conclusions: EpiphaNet provides an interactive visual representation of associations between concepts, which is derived from distributional statistics drawn from across the spectrum of biomedical citations in MEDLINE. This tool is available online, providing biomedical scientists with the opportunity to identify and explore associations of interest to them

University of Illinois at Chicago: Journals@UIC

PubMed Central

Students’ perceptions of school acoustics and the impact of noise on teaching and learning in secondary schools : findings of a questionnaire survey

Author: Astolfi
Bridget M. Shield
Clark
Cohen
Cohen
Cohen
Cohen S:
Connolly
Crandell
Daniel M. Connolly
Dockrell
Hétu
Julie E. Dockrell
Kennedy
Matheson
Nelson
Nelson
Rob Conetta
Shield
Trevor J. Cox
Publication venue: 'Elsevier BV'
Publication date: 30/11/2015
Field of study

This paper will present the design and findings of an online questionnaire survey of 11–16 year olds’ impressions of their school's acoustic environment, and of an experimental study into the effects of typical levels of classroom noise on adolescent's performance on numeracy and cognitive functioning tasks. Analysis of the responses to the questionnaire found that pupils who reported additional learning needs such as hearing impairment, speaking English as an additional language or receiving learning support reported being significantly more affected by poor school acoustics than pupils reporting no additional learning needs. Pupils attending suburban schools featuring cellular classrooms that were not exposed to a nearby noise sources were more positive about their school acoustics than pupils at schools with open plan classroom designs or attending schools that were exposed to external noise sources. The study demonstrates that adolescents are reliable judges of their school's acoustic environment, and have insight into the disruption to teaching and learning caused by poor listening conditions. Furthermore, pupils with additional learning needs are more at risk from the negative effects of poor school acoustics

University of Salford Institutional Repository

Elsevier - Publisher Connector

Crossref

Maternal obesity reduces placental autophagy marker expression in uncomplicated pregnancies

Author: Cohen Matthew
de Vrijer Barbra
Eastabrook Genevieve
Guo Emily
Pucchio Aidan
Shepherd Trevor G
Publication venue: Scholarship@Western
Publication date: 01/08/2020
Field of study

AIM: Obesity has been associated with changes in autophagy and its increasing prevalence among pregnant women is implicated in higher rates of placental-mediated complications of pregnancy such as pre-eclampsia and intrauterine growth restriction. Autophagy is involved in normal placentation, thus changes in autophagy may lead to impaired placental function and development. The aim of this study was to investigate the connection between obesity and autophagy in the placenta in otherwise uncomplicated pregnancies. METHODS: Immunohistochemistry and western blot analysis were done on placental and omental samples from obese (body mass index [BMI] ≥30 kg/m RESULTS: As pre-pregnancy BMI increased, there was an increase in both placental and fetal weight as well as decreased levels of LC3B in the central region of the placenta (P = 0.0046). Within the obese patient group, LC3B levels were significantly decreased in the placentas of male fetuses compared to females (P \u3c 0.0001). Adipocytes, compared to milky spots and vasculature, had lower levels of p62 (P = 0.0127) and LC3B (P = 0.003) in obese omenta and lower levels of LC3B in control omenta (P = 0.0071). CONCLUSION: Obesity leads to reduced placental autophagy in uncomplicated pregnancies; thus, changes in autophagy may be involved in the underlying mechanisms of obesity-related placental diseases of pregnancy

Scholarship@Western